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ABSTRACT 

Spectral measurements of the 21 cm monopole background have the promise of revealing the bulk energetic 
properties and ionization state of our universe from z ~ 6 — 30. Synchrotron foregrounds are orders of 
magnitude larger than the cosmological signal, and are the principal challenge faced by these experiments. 
While synchrotron radiation is thought to be spectrally smooth and described by relatively few degrees of 
freedom, the instrumental response to bright foregrounds may be much more complex. To deal with such 
complexities, we develop an approach that discovers contaminated spectral modes using spatial fluctuations of 
the measured data. This approach exploits the fact that foregrounds vary across the sky while the signal does 
not. The discovered modes are projected out of each line-of-sight of a data cube. An angular weighting then 
optimizes the cosmological signal amplitude estimate by giving preference to lower-noise regions. Using this 
method, we show that it is essential for the passband to be stable to at least ~ 10~ 1 2 3 4 . In contrast, the constraints 
on the spectral smoothness of the absolute calibration are mainly aesthetic if one is able to take advantage of 
spatial information. To the extent it is understood, controlling polarization to intensity leakage at the ~ 10 2 
level will also be essential to rejecting Faraday rotation of the polarized synchrotron emission. 

Subject headings: dark ages, reionization, first stars - methods: data analysis - methods: statistical 


1. INTRODUCTION 

One of the richest yet least understood narratives in cosmol- 
ogy is the formation of the complex structure that we see to- 
day out of the simple initial conditions implied by the cosmic 
microwave background (CMB). The first luminous objects are 
thought to have formed at 2 ~ 20 — 30 through collapse in 
10 6 — 10 8 M 0 halos (Barkana & Loeb 2001; Bromm 2013). 
The radiation from these objects heated and then reionized 
the intergalactic medium (IGM). There are several sources of 
complementary information about the evolution of ionization 
in this epoch. 

The CMB temperature anisotropy is damped by the total 
Thomson depth to free electrons. The Planck collaboration 
has used this effect, combined with a constraint on the scalar 
amplitude from gravitational lensing, to measure the op- 
tical depth through reionization (Planck Collaboration et al. 
2013). In addition, Thomson scattering through the reion- 
ization epoch generates a unique polarization signature on 
large angular scales. WMAP has measured the total optical 
depth using this polarization signature (Bennett et al. 2013). 
These are integral constraints on the free electron abundance, 
and can be translated into a central reionization redshift of 
10.6 ± 1.1 (Bennett et al. 2013). 

Once the IGM is highly ionized, it is transparent to Lyman- 
a photons. Absorption measurements along sight lines to 
high-redshift quasars indicate that reionization must have 
ended by z < 6 (Fan et al. 2006). Absorption saturates at 
low abundance, so these should be taken as bounds on the end 
of reionization, which could still have been largely complete 
at redshifts higher than 6. 
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Recently, two methods have been developed to place much 
more direct bounds on the duration of reionization. The 
ionization process is thought to be spatially patchy, as lo- 
cal sources of radiation blow ionized bubbles that coalesce 
into the fully-reionized IGM. CMB photons scattering from 
this patchy screen produce an additional kinetic Sunyaev- 
Zel’dovich anisotropy appearing most clearly at i > 3000, 
where the primary CMB is negligible (Knox et al. 1998). Up- 
per limits on this effect translate into a model-dependent up- 
per bound on the duration of reionization (Zahn et al. 2012), 
and hold the promise of direct detection of patchy structure in 
the near future. 

The patchy structure of reionization can also be observed 
directly in three dimensions using emission of neutral hydro- 
gen through its 21 cm line (Furlanetto et al. 2006). Recent 
bounds from GMRT (Paciga et al. 2013), MWA (Dillon et al. 
2014) and PAPER (Parsons et al. 2013; Pober et al. 2013) are 
marching down to the expected level of fluctuations, in par- 
allel to efforts at LOFAR (van Haarlem et al. 2013). An al- 
ternative to measuring the 21 cm anisotropy is to measure the 
signal of its global emission (or absorption at earlier times) 
(Pritchard & Loeb 2008, 2010), which reveals the bulk ener- 
getic properties and ionization state of the universe during 
reionization and preceding epochs when the first luminous 
structures were forming. Global 21 cm experiments include 
EDGES (Bowman & Rogers 2010) and SCI-HI (Voytek et al. 
2014) (which have both reported bounds), LEDA 5 and the 
proposed DARE mission (Burns et al. 2012). 

The frequencies of interest in these global studies are ~ 
50 — 200 MHz, and fiducial theoretical models suggest a max- 
imum contrast of ~ 100 mK relative to the synchrotron emis- 
sion of the galaxy, which can vary ~ 10 2 — 10 5 K across the 
sky and frequency range. Astrophysical synchrotron emis- 
sion is thought to be fully described by a handful of spectrally 
smooth functions that can be distinguished from the variation 
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of the global reionization signal. 

Extremely bright foregrounds make the measurement sus- 
ceptible to instrumental systematics. For example, if an in- 
strument with a 1% ripple in one spectral band observed a 
500 K power law, subtraction of a pure power law would leave 
a 5 K residual, significantly larger than the signal. Through 
careful instrumental design, the level of instrumental system- 
atics may be controlled, but generally cannot be nulled en- 
tirely. 

Here, we develop methods that can be used to con- 
strain the cosmological signal in these heavily contaminated 
data. Following the monopole nature of the signal, experi- 
ments to-date have mapped the sky with very large beams 
(Bowman & Rogers 2010; Voyteketal. 2014). However, a 
unique trait of the foregrounds is that they vary across the 
sky, while the signal is constant. This regime is the oppo- 
site of the situation normally found in analysis of small sig- 
nals, where a modulated signal is pulled out of foregrounds. 
Liu et al. (2013) (henceforth L13) proposed that experiments 
seeking to measure the global signal should also resolve the 
sky with an instrumental beam. This allows selective weight- 
ing against regions of high contamination, and allows for the 
use of angular correlation information to reject foregrounds. 
The additional spatial resolution yields higher fidelity recov- 
ery of the cosmological 21 cm spectrum. 

Here, we extend the ideas in LI 3 to a method that uses 
the spatial fluctuations of foregrounds in the data to discover 
contaminated spectral modes. A similar idea has been em- 
ployed successfully in 21cm large scale structure measure- 
ments (Switzer et al. 2013; Masui et al. 2013; Chang et al. 
2010) at 2 ~ 1, and has been suggested for cleaning iono- 
spheric contamination (Vedantham et al. 2014). 

Discovery of foreground spectral modes in the measured 
data makes the method more robust to assumptions about 
the foreground contamination. For example, now if the in- 
strumental passband has a 1% ripple, the largest foreground 
mode discovered in our foreground cleaning method will also 
self-consistently exhibit this ripple. Generally, instrumen- 
tal systematics take relatively clean and smooth functions of 
frequency from synchrotron emission and convert them into 
more complex structure that requires additional spectral func- 
tions to describe. We argue that the primary goal in instru- 
mental design should be to prevent proliferation of bright, new 
foreground modes in the data. Each new foreground degree 
of freedom produced by instrumental response to foregrounds 
results in more signal loss and makes discovery of the signal 
more ambiguous. 

The methods described here of 1) using spatial variation to 
discover spectral foreground modes, which can then be pro- 
jected out, and 2) downweighting known spatial areas of high 
contamination (the galaxy) provide the strongest methods for 
recovering the global 21 cm signal in the absence of additional 
prior information about the foregrounds or instrumental re- 
sponse. While the algorithm of mode subtraction and angular 
weighting is intuitive, we develop it from the ground up to 
expose several implicit choices and possible pitfalls. 

Recently, Bernardi et al. (2014) argued that a dipole gain 
pattern can be calibrated in an interferometric array. How- 
ever, additional variations in spectral response due to factors 
such as the analog to digital converter, reflection and signal 
loss after the antenna were not included. Our goal here is to 
understand how data analysis can be made more robust to this 
class of instrumental response (or any other source of fore- 
ground covariance), or alternately how tightly certain instru- 


mental tolerances must be constrained. 

In Sec. 2 we review the basic properties of the global signal 
and describe our foreground model. Sec. 3 builds up the esti- 
mator for joint foreground and signal estimation using a sim- 
plified model of spectra along independent sightlines. Sec. 4 
considers implications of this model for passband calibration. 
Sec. 5 develops spatial weights, and Sec. 6 combines the es- 
timators with spatial and spectral weights. Sec. 7 describes 
a number of considerations for using the methods developed 
here and for global 21cm signal estimation in general. We 
discuss telescope beamwidth, the foreground monopole, how 
aggressively foreground modes should be removed, and sus- 
ceptibility to Faraday rotation. We also consider mode re- 
moval of the pre-reionization absorption feature and exten- 
sions of the simple template amplitude constraint considered 
throughout. We summarize our conclusions in Sec. 8. 


2. THE GLOBAL SIGNAL AND FOREGROUND 
MODELS 

From radiative transport arguments, the brightness temper- 
ature of 21 cm radiation is 
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where Xi is the mean ionization fraction, T$ is the spin tem- 
perature of the hyperfine transition, and Tcmb is the CMB 
temperature. The basic physics of the spin-temperature cou- 
pling is well-understood (Pritchard & Loeb 2010), but the 
detailed astrophysical processes that determine the coupling 
strength and the gas temperature are still conjectural. 

For z > 200, the universe is dense enough that electron 
collisional interactions drive the spin temperature to the gas 
temperature, which is cooling faster than the CMB. This pro- 
duces absorption. By z ~ 30, the universe is sufficiently rar- 
efied that the spin temperature is better coupled to the CMB 
bath, and absorption is expected to subside. Once the first lu- 
minous objects form, these produce radiation that drives the 
Wouthuysen-Field (Wouthuysen 1952; Field 1958) coupling 
of Ts again to the gas temperature, leading to a second ab- 
sorption feature. Then, X-ray heating of the gas can drive 
the spin temperature above the CMB temperature, leading to 
emission. As these luminous processes proceed and increase, 
they also ionize the IGM, which causes the signal to disappear 
as 1 — Xi — > 0 (Pritchard & Loeb 2010). 

We will find it convenient to have a reionization model with 
a small number of parameters, rather than a full spectrum. For 
concreteness, we will spend most of the paper focusing on 
the evolution of the brightness temperature as the universe is 
re-ionized, rather than the absorption dip as heating begins. 
However, our methods are applicable at all redshifts, and in 
Sec. 7.6 we will briefly discuss the pre -reionization dip. 

At the beginning of the reionization epoch, it is thought that 
the spin temperature is strongly coupled to the gas, and that 
the gas is heated, driving 21cm emission. As the ionization 
proceeds, this emission dies away. A simple way of parame- 
terizing this is (Pritchard & Loeb 2010) 
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so that the brightness temperature scales as (Fig. 1) 
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Figure 1. Brightness temperature of the global 21 cm emission in a simpli- 
fied two-parameter model of reionization. The central redshift is taken from 
the WMAP9 constraint at z = 10.6, and several representative values for the 
duration of reionization are shown. 
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Figure 2. Foreground model at 80 MHz, displayed as log(T). 


where we have assumed that T$ Tqmb- 

Our model for the diffuse intensity (Stokes-I) of the fore- 
grounds is based on an extended version of the Global Sky 
Model (GSM) of de Oliveira-Costa et al. (2008) that is de- 
veloped in L13. The original GSM used a principal com- 
ponent analysis to extrapolate and interpolate between pre- 
vious galactic surveys at a wide range frequencies. It is based 
on 3 spectral components and is accurate down to rs 10°. 
LI 3 extends this model by adding additional (less smooth) 
spectral eigenmodes, as well as a population of power-law 
sources with randomly distributed spectral indices, drawn 
from the dN/dS brightness distribution of Di Matteo et al. 
(2002). The goal of adding these components is to boost the 
rank of the foreground covariance to reflect spectral fluctua- 
tions that can be expected on the real sky. These cannot be 
probed in the original rank-3 GSM. An example map of our 
foreground model at 80 MHz is shown in Fig. 2. 

In Sec. 7.7, we will additionally consider the impact of 
Faraday rotation and polarized foregrounds. As this is some- 
what peripheral to the development of our data analysis meth- 
ods in the next few sections, we will defer our discussion of 
our foreground polarization model until then. 


3. GLOBAL 21 CM OBSERVATIONS ALONG 
INDEPENDENT LINES OF SIGHT 

A global 21 cm experiment that maps the sky will have a 
finite number of resolution elements across its field, defined 
by the beam. In each one of these resolution elements, it ob- 
serves the spectrum of incoming radiation. A reasonable ap- 
proximation is to split the survey into Ng spectra of length 
N u along all the angular resolution elements. Let y i be that 
vector, where i indexes the sight line (1 to Ng) and the length 
of the vector is N v . In this section, we will build up infras- 
tructure and intuition using this simplified case of Ng inde- 
pendent and identically-distributed sight lines. Sec. 6 extends 
this to a method that treats realistic foregrounds with proper 
angular correlations. Because angular correlations become 
largely irrelevant when considering wide-beam experiments 
with essentially no angular sensitivity, much of the intuition 
developed here is directly transferable to the analysis of such 
experiments. 

In writing an estimator, there are many choices for aspects 
of the global signal that could be estimated. The goal could 
be to constrain 1) an arbitrary spectrum of the cosmological 
21 cm evolution, 2) some modes of its variation that relate to 
physical parameters, 3) some amplitudes that are based on ex- 
ternal information about the expected redshift of reionization, 
4) the amplitude of a template of a provisional global 21 cm 
signal. For our purposes here, it is simplest to develop esti- 
mators for the amplitude of an assumed 21 cm signal template. 
Initially, experiments will simply be stepping down in bounds 
and seeking some evidence of 2 ~ 20 absorption and heat- 
ing or a z ~ 11 reionization signal — an amplitude constraint 
could provide the simplest, clear indication of a cosmological 
signal. Sec. 7.5 considers other estimation regimes in light of 
the methods developed here. 

Let the assumed template of the global signal be a vector x 
of length N v and normalized to have a maximum of 1. Mul- 
tiply by some amplitude a to get the 21 cm signal. Then the 
observed spectrum is the sum of signal, thermal noise n.i nst ,j 
and foregrounds , 

Ui = ax + ra fgji + n insM . (4) 

Initially we will assume that the foregrounds and noise 
are identically and independently normally distributed along 
each line of sight i, n f gi j ~ 1V(0, Sf g ) and rii ns t,i ~ 
N(0, Si ns t). In reality, the foreground field is strongly cor- 
related in angle and non-Gaussian. These issues will be ex- 
amined in subsequent sections. In contrast, thermal noise 
is uncorrelated between sight lines and normally distributed: 
n inst,i ~ iV(0,£ inst ) will remain an excellent approxima- 
tion. Throughout, the N v x N v matrix S = S illst + Xf g will 
refer to the total [y, v') covariance. 

3.1. Known covariance 

If the foregrounds and thermal noise are drawn from a total, 
known covariance E, then the maximum likelihood estimate 
for the template amplitude is 

«ml = {x t 'E~ 1 x)~ 1 x t T ] -1 y (5) 

where 

N e 

V = Ng 1 ^y i (6) 

2=1 

is the mean spectrum along all lines of sight. 


4 


Switzer & Liu 


This can be understood as “deweighting” the foregrounds 
(S _1 y), projecting onto the signal template (.r 7 £ ] y) and 
then applying a normalization to account for the weights. The 
estimated amplitude is normally distributed and its error is 

Var(a M L)|s = (x T 'S^ 1 x)~ 1 . (7) 

To convert this into a more intuitive cleaning process, take 
the eigendecomposition of the covariance £ = V AV 1 . The 
cleaning operation £ ~ x y is then V A ~ 1 V T y. Here V T 
projects the data onto a basis where the contaminated modes 
Vj (the j’th row of V) are orthonormal. Those modes are 
weighted against XJ 1 = [A -1 ]^, where the highest vari- 
ance, most contaminated modes are down-weighted. Then, 
V projects this down-weighted basis back onto the spectral 
basis. 

In the usual assumptions, the foregrounds are not full rank, 
even if the spectral covariance £ — which contains instrumen- 
tal noise — is full rank. Often, the foregrounds are instead de- 
scribed as some set of contaminated modes Vj where j ranges 
from 1 to the number of contaminated modes Nf s , or 

iV fg 

rr f ( .. i — 'y ^ Oj jVj . (8) 

j 

If the foreground covariance is described by only Nf g 
highly contaminated modes, a robust cleaning method is 
to null those modes entirely, setting A — > oo artificially 
(Liu & Tegmark 2012). This is equivalent to fitting and sub- 
tracting those spectral modes from each line of sight, as 

N ls 

y c i ea n = Yl( 1 ~ v i v J')y' (9) 

3 

where the v, are normalized so that vfv, = 1. 

Here we have assumed that the foreground spectral func- 
tions are known in advance and can be projected out. This 
is essentially the same as arguments for subtracting polyno- 
mials or power laws along the line of sight. A crucial differ- 
ence is that there is no assumption of smoothness — if the fore- 
grounds were known to have a particular spectral shape, that 
could be represented in the vectors Vi. The essential prop- 
erty of the foregrounds that allows for their removal is not 
that they are smooth, but that they are described by few func- 
tions. In the limit that the number of orthogonal functions 
removed approaches N„, all of the signal is removed because 
i = 1 . . . Ay} spans the space, assuming orthogonality 
of the Vi. The rank of bright foregrounds is the primary de- 
terminant of cosmological signal that can be extracted. 

Subtraction of bright foregrounds has proven to be signif- 
icantly more challenging in practice due to instrumental ef- 
fects (Switzer et al. 2013). Fig. 3 shows an example of 0.1% 
calibration for a power law foreground spectrum similar in 
amplitude to the one reported by Voyteketal. (2014), and 
scaled to the slightly different frequencies of interest here. 
This represents the case where the Vi mode removed assumes 
a pure power law, but the actual measurement reflects the in- 
strument’s response to Vi. Residuals are considerably larger 
than the signal. 

One approach to treating the passband is to control it 
through instrumental design, and to measure it very precisely 
prior to the experiment. In principle, this could be achieved 
by long integrations on a calibration reference that is known 
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Figure 3. Impact of calibration uncertainty. Top: a power-law foreground 
spectrum similar to the one observed in Voytek et al. (2014), multiplied by a 
0.1% calibration error. Bottom : residuals when a smooth power-law spec- 
trum is subtracted from data with a calibration error. For contrast, the global 
signal at these redshifts is expected to be ~ 30 mK. Also note that calibration 
error will not integrate down thermally. 



to be smooth. This carries the challenge of calibrating on 
a source other than the sky. An alternative is to recognize 
that the foregrounds are a bright spectral reference, viewed 
through the same survey program as the primary science. One 
calibration scheme would then consist of integrating down on 
bright synchrotron emission and renormalizing the spectrum 
under the assumption that the emission is spectrally smooth 
(Voytek et al. 2014). 

An alternative to calibrating the instrument to meet the 
smooth mode functions {vi\ is to adjust the mode functions 
to reflect imperfections in the instrument. In the next sec- 
tion we develop a method for discovering foreground spectral 
mode functions in the data based on their spatial variations. 
The central idea is that the cosmological signal is a monopole 
spectrum, so the spectrum of anything that fluctuates across 
the sky must reflect some foreground modes (hence the ti- 
tle “Erasing the variable”). If there are monopole foreground 
components, they are formally indistinguishable from the cos- 
mological 21 cm monopole in this picture (in the absence of 
prior information). 

Measurement of {?;, } modes within the data may also dis- 
cover unanticipated or unconstrained instrumental response 
systematics. Examples include 1) a passband that varies with 
time or pointing, which causes a response to synchrotron ra- 
diation that varies across the sky, 2) polarization leakage, 
which can cause Faraday (spectral) rotation that varies across 
the sky, 3) terrestrial interference that varies with instrument 
pointing, and 4) un-modeled frequency dependence of the 
beam. Even if the foregrounds were intrinsically a simple 
rank-1 spectral function, the instrument response will tend to 
proliferate the rank of the foreground covariance. This limits 
the ultimate sensitivity to the global signal. While many of 
these sources of systematics may be tightly controlled by the 
instrument’s construction, residual imperfections will not be 
known in advance and could be discovered by this method. 

We will assume throughout that the true synchrotron fore- 
grounds are described by a small number of modes Af gjtr ue- 
The instrument observes these bright foregrounds and records 
spectral data that may need some N[ g > N{ gtruc functions to 
describe the contamination. 


FOREGROUND DISCOVERY FOR THE 21 CM MONOPOLE 


5 


3.2. Covariance from the measurements 

This section considers the joint determination of the global 
signal amplitude a and the contaminated foreground modes. 
The statistical methods described and extended here were de- 
veloped by Rao (1967). The central idea in these methods is 
to begin with a poor, contaminated estimator for the 21cm 
signal amplitude. This initial estimator can then be cleaned 
up by projecting out correlations with the foregrounds. 

To reflect the reality that often one does not know the co- 
variance a priori , first consider a simple estimator that does 
not use any information from the spectral covariance. This 
will serve as our starting point for constructing an estimator 
that simultaneously estimates the spectral covariance from the 
data. Again, let x be the spectral template of the cosmologi- 
cal signal that we would like to constrain. The simplest (and 
arguably worst) estimator we could develop for the amplitude 
is 

T(|| x) = (x T x)~ l x T y, (10) 

which takes the dot product of the 21cm template and ob- 
served data (x T y) and normalizes through (x T x)~ 1 . We re- 
fer to this as T(|| x) because it estimates the amplitude of the 
component of y parallel to the signal x. This is exactly the 
estimator that would be used if there were no frequency cor- 
relations in the noise. What makes this a very poor estimator 
is that it is strongly correlated with the foreground modes, and 
will therefore contain strong foreground residuals. In contrast, 
Eq. (5) in the previous section used the known frequency co- 
variance to de-weight the foreground contamination. 

We can begin to improve upon our simple estimator by con- 
structing spectral modes that contain no cosmological signal 
by design. Using a Gram-Schmidt process, we can make a set 
of vectors z-i that span the space orthogonal to the theoreti- 
cal template of the cosmological signal x, so that zfx = 0. 
There are N„ — 1 such vectors, and the choice of these vec- 
tors is completely arbitrary at this point. Any spectrum can be 
represented as a sum 

y = ax + y^ pjZj ( 11 ) 

i 

because { x , z, } span the 7V„-dimensional space. We can pack 
these vectors z % into a matrix Z and write an estimator for all 
the spectral information orthogonal to x, T(J L x) = Z T y. 
This represents all the components of y orthonormal to the 
signal, and is dominated by foregrounds. (Note that we 
could choose to normalize this for general vectors z and find 
(. Z 1 Z)~ l Z T y. This would make the equations more com- 
plex, but not modify the estimator for the cosmological quan- 
tity.) 

The covariance between T(|| x) and T(_L x) is 


(and no cosmological signal) is correlated with our estimate 
of the cosmological signal, so the latter must be contaminated. 

With knowledge of spectral correlations, we can form an 
improved, adjusted estimator 

a = T(\\x)-C l± Cf\T(±x) (13) 

that projects foreground frequency correlations out of the T(|| 
x) estimator. Intuitively, this estimator instructs us to make 
an estimate T(_L x) of the portion the measurement that is 
known to contain only foregrounds. Off-diagonal elements 
of the covariance matrix then allow the level of foreground 
leakage (into our estimate T(|| x) of the cosmological signal) 
to be predicted and subtracted off. A similar technique was 
used recently in a power spectrum analysis of PAPER data 
(Parsons et al. 2013). 

So far, we have assumed that E is known, but the same 
covariance adjustment can be performed with respect to an 
estimated E. Without perfect knowledge of S, the error bars 
grow but the estimator remains unbiased. The (is, o') covari- 
ance can be estimated empirically using 

N e 

£ = (N e ~i)- 1 J2(y i -y)(y l ~y) T d4) 

i=l 

Writing out Eq. (13), 

a = (x T x)~ 1 x T (1 — EE )y (15) 

where II = EZ(Z T EZ) _1 Z T . Remarkably, under our cur- 
rent assumptions where Z spans the rest of the N v dimen- 
sions, this estimator is equivalent to (see Appendix A) 

a = (x T 'E~ 1 x)~ 1 x T 'S~ 1 y, (16) 

which is precisely the same as Eq. (5), except with the esti- 
mated spectral covariance E replacing the known covariance 

E. 

Gleser & Olkin (1972) develop the same result through a 
completely independent method. They write down the joint 
likelihood of the covariance and mean (cosmological signal), 
where the covariance is Wishart-distributed and the mean is 
normally distributed. They then maximize the likelihood and 
find the same result, showing that the choice E — > E coin- 
cides with the maximum likelihood. 

In summary, the proposed procedure is to 

• Find the sample mean and (is, is') covariance across ob- 
served lines of sight. 

• Invert the measured covariance and use it to down- 
weight contaminated modes in the data. 


Cov||,x = 




( 12 ) 


f(x T x) 1 x t Tix(x t x) 1 (x T x) 1 x T HZ\ 

^ Z T 'Sx(x T x)~ 1 Z T EZ ) ' 


The off-diagonal terms (x T x)- 1 x 1 E Z represent correla- 
tions between T(|| x) and T(_L x) caused by the (is, is') co- 
variance of foregrounds in E. If the noise terms were only 
thermal fluctuations, E = of nst l, and by the construction 
x 1 Z = 0, the off-diagonal terms vanish. In other words, an 
estimator T(_L x) that is known to contain only contaminants 


• Find the inner product of the cleaned data and the signal 
template, then normalize. 

While the maximum of the likelihood has the same form 
as the case where the covariance is known, the distribution 
of the estimated 21 cm global signal template amplitude is no 
longer Gaussian, and is generally broader. These changes are 
due to the fact that the data are also used for the covariance 
estimation, which uses up degrees of freedom in determining 
the foreground modes. Rao (1967) shows that 

Var( “ } = N^-^r [C ^ " C HA c 'lf± c '-L,||]> (17) 
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where r is the rank of the additional degrees of freedom that 
are estimated. In our case here, r = N v 1 (Z spans the rest 
of the spectral space) so that 

Var(d) = (l + ) {x T tr 1 xY t , (18) 

where we have also plugged in the explicit forms for Cy m, 
C,| x ,C ±t± , and C j_ m from Eq. (12). We see that the co- 
variance is enhanced relative to the case of perfect foreground 
covariance knowledge [Eq. (7)] by a factor related to the num- 
ber of frequencies and independent lines of sight. The er- 
ror diverges when Ng approaches N v from above. The rank 
r(£) < min (Ng,N v ), and for Ng N v , the covariance is 
well-measured. This suggests that the optimal limit is to have 
many more resolution elements than spectral bins. 

There is little instrumental limitation on the number of 
spectral bins N„ over the bands observed. In contrast, there 
is a hard limit on the number of “independent” lines of sight 
that can be observed, Ng. At the frequencies of interest for the 
global 21 cm signal, the beam size tends to be large by diffrac- 
tion. Angular correlations and noise generally limit the num- 
ber of independent samples of the spectrum (see Sec. 7.1). 

The requirement Ng > N v stems from the fact that £ is 
trying to estimate a full N v rank covariance. In general, the 
sensitivity to a should not depend on the number of spectral 
bins in the survey, once they become fine enough to resolve 
the signal. The solution is to instead estimate some number 
iVfg of contaminated spectral functions, less than the number 
of spectral bins. Then the estimation becomes independent of 
the N v , so long as N v > Nf g . 

3.3. Empirically-determined foregrounds of limited rank 

The key step in the covariance adjustment scheme of the 
previous section was the formation and application of II = 
SZ(Z T SZ) _ 1 Z T , which modeled contaminated modes in 
the original “poor” estimator for later subtraction. The modes 
formed a basis Z orthogonal to the signal x. The specific 
choice of vectors in Z was arbitrary so long as they were 
orthonormal (for simplicity) and spanned the spectral sub- 
space orthogonal to the signal. For concreteness, we sug- 
gested forming this basis blindly using a Gram-Schmidt pro- 
cess. While formally a solution to our problem, this is not 
a particularly efficient way to implement our recipe in prac- 
tice, for each of the resulting basis vectors will be a linear 
combination of noise and foreground modes. On the other 
hand, the foregrounds should be describable by a small num- 
ber of modes (Liu & Tegmark 2012; Bernardi et al. 2014). A 
sensible alternative would therefore be to intelligently parti- 
tion the basis Z into two sub-bases, one that is of a relatively 
low rank _ZVf g containing the foregrounds, and another of rank 
N v — Nf s — 1 consisting of the remaining noise-dominated 
modes. Computationally, this means that rather than estimat- 
ing the full Nv rank covariance from the data, we only need 
to estimate a subset of contaminated modes and their ampli- 
tudes. This limits the number of degrees of freedom r that 
need to be estimated from the data, which as we saw from 
Eq. (17), is crucial for keeping the final error bars small. 

In the rank-restricted foreground approximation, the data 
along a line of sight are the cosmological signal plus some 
amplitudes /3 times foreground modes F plus thermal noise, 
as 

(19) 


where we have taken the noise to be stationary for simplicity. 
The amplitude-ordered eigenvalue spectrum of the total co- 
variance £ in this case would show some large contaminant 
amplitudes followed by a noise floor set by a. 

Unlike Z of the previous section, the foreground mode vec- 
tors in F alone do not span the entire spectral subspace or- 
thogonal to the cosmological signal. The remaining portion 
of the space is spanned by basis vectors that describe instru- 
mental noise, under the assumptions of Eq. (19). These vec- 
tors can be formed, like before, using a Gram-Schmidt pro- 
cess (this time relative to the signal x and foregrounds F) 
and packed into a matrix G. The observed spectrum y can 
then be written as 

N fg N v 

y = ax + Pjfj + ( 2 °) 

3 = 1 j=Nf g + l 

which is to be contrasted with Eq. (11). 

Again, we can form poor estimators for the amplitudes of 
the signal (a = ( x T x)~ 1 x T y ), foreground modes (/3 = 
F T y ) and thermal noise modes (7 = G T y). To do so, how- 
ever, we must first specify how the foreground modes F are 
identified and defined, and in the following section we outline 
two different methods for this. 

3.4. Two outlooks on separation of signal and foregrounds 

The first method is closely related to the treatment that we 
have presented so far. In particular, we considered modes in 
the matrices Z (full rank) or F (finite rank) that represent 
components of the data that are orthogonal to the cosmolog- 
ical signal. There is, however, nothing preventing real fore- 
grounds from being parallel to the signal, and generally raf g 
will be the sum of nfgi\ x (foregrounds parallel to the cosmo- 
logical signal) and n f s ,± x (foregrounds perpendicular to the 
signal). For example, if the signal and foregrounds share a 
slowly varying spectral component, it will fall in rif g ^ x . 

An alternative is to form foreground spectral modes that 
best describe the covariance of the fluctuating terms on the 
sky, without regard for the cosmological signal. Then, the 
signal has a piece which is parallel to those foreground modes, 
and a piece which is perpendicular, or x — x\\p + x±p- 

In equations, these two methods are 

Method 1: y = x + nf g ^ x + nf gt ± x delete: n f s ,± x 

Method 2: y = x±p + x\\p + ri[ g delete: x^p + nfg. 

In the first method, we can develop a cleaning operation that 
removes/deweights nf g ± x , that is, components of the fore- 
ground variance orthogonal to the signal. This estimator does 
not touch the signal by design, so is guaranteed to have no cos- 
mological signal loss. However, substantial foregrounds will 
remain in the estimated signal with this rather conservative 
approach. In the second method, we can develop cleaning that 
removes any foreground spectral modes that vary spatially on 
the sky, along with any component of the signal parallel to 
that. If implemented carelessly, this aggressive method will 
entail cosmological signal loss, although the problem is recti- 
fiable. 

Method 1 is described in Rao (1967), and is a simply a 
slight modification of the prescription in Sec. 3.2. Method 
2 is the one we develop here and advocate for global 21 cm 
signal recovery. 

We demonstrate the differences in approach using simple 
simulations. These have Ng lines of sight and a number of 


y = ax + F/3 + n t => £ = FTF t + of nst l. 
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Figure 4. A simple simulation of 10 lines of sight with a random combina- 
tion of three power laws. 


power laws in the synchrotron emission (to limit the rank). 
The indices of these power laws are {—1.5, —1.8, —2} and 
the amplitude is uniformly distributed and positive. Again, 
as amplitudes we take representative values from the recent 
observations of Voytek et al. (2014) and scale to the slightly 
higher frequencies of interest here. For the purposes here we 
will neglect thermal noise, and assume the integration is deep 
relative to the cosmological signal. Fig. 4 shows one realiza- 
tion of the foreground model. This model is only meant to be 
pedagogical, showing the main point of the algorithms here 
for independent sightlines in a simplified setting with three 
spectral modes. Beginning in Sec. 5 we will use the full ex- 
tended GSM model data cubes described in Sec. 2. 


3.4.1. Method 1: Foreground modes orthogonal to signal 

To find foreground spectral modes that are orthogonal to the 
cosmological signal x, we can find the largest eigenmodes of 
the restricted covariance 


S 


_Lje 


[1 — x(x T x) 1 x T ]£ 1 


( 21 ) 


which in practice will be eigenmodes that represent fore- 
grounds. By construction, any eigenmode f i of S| ±a _ will 

be orthogonal to x, i.e. fj x = 0. Let F± hold the restricted 
eigenvectors The top left subplot of Fig. 5 shows the unre- 
stricted eigenvectors of an input foreground simulation, while 
the bottom left subplot shows the restricted eigenvectors rela- 
tive to a signal template. 

Having identified our foreground eigenvectors, we can con- 
struct our estimator for the signal amplitude a using the meth- 
ods introduced in Sec. 3.2. Once again, we first form a poor 
initial estimator of the signal, which we then adjust using the 
covariance between the poor estimator and the foregrounds 
(which is non-zero because a substantial foreground contam- 
ination remains in the estimate of the signal). The covariance 
of this estimator takes the form 


p _ f (x T x) l x T £x(x T x) 1 ( x T x ) 1 X T £iFj_ \ 
OY a,0 - ^ F T 1 £,x{x T x)- 1 Fl£F ± ) ’ 

( 22 ) 

which looks exactly like Eq. (12), but with all occurrences of 
Z replaced by F \ . This is unsurprising, since the modes in 
G represent a diagonal piece of the covariance, of nst l, so are 


not correlated in frequency with either the foreground modes 
or the signal. 

Projecting out the piece of the signal estimator that is cor- 
related with the foregrounds, 

a= (x T x)~ 1 x T (l — H)y (23) 

n = EF 1 (f’{EF 1 )- 1 F{. (24) 

That is, project foreground-contaminated modes in the data 
and then dot against the signal template. The operation II puts 
the signal in the basis of foreground modes, weights those by 
their covariance, and moves back to the spectral space. Note 
that F± does not span the rest of the spectral space like Z 
in the previous section. Here, Eq. (23) is not equivalent to 
a = ( x T £~ 1 x)~ 1 x T £~ 1 y . 

By construction, our estimator suffers from no loss of cos- 
mological signal (despite the projecting-out of foregrounds) 
because (1 — II)a; = x — £Fj_(F 1 [£F±)~ 1 F , ^_x = x. 
In the last step, we used the fact that F^x = 0 through the 
choice of forming F with eigenvectors restricted to be or- 
thogonal to x, Eq. (21). 

The error on a is 

Var(d) cx [0,1,11 - C|, i± Cl^C7 Xi ||] (25) 

oc (x T x)~ 1 x T (l — U)£x(x T x)- 1 . (26) 

Here x T — H)£x can be interpreted as projecting all of the 
foreground modes orthogonal to the signal out of the (V, z/) 
covariance and finding the noise with respect to that resid- 
ual covariance. The residual covariance originates from fore- 
ground components parallel to the signal and from thermal 
noise — any (i/, v') covariance component with a non-zero dot 
product into the signal x will contribute to the variance of a. 
This property can be seen in Fig 5. The recovered signal has 
exactly the shape of the signal template x, but the amplitude is 
dominated by foregrounds. This feature is clearly undesirable 
in the regime where foregrounds vastly exceed the signal. 

Another undesirable property of this estimator is that the 
quoted error Var(d) relies on the foregrounds being normally 
distributed. Real foregrounds are strongly non-Gaussian, and 
we would like to avoid describing those components in the er- 
rors of a. We can make a simple modification to the estimator 
in Rao (1967) to treat both of these shortcomings. 

3.4.2. Method 2: Signal component orthogonal to foregrounds 

In some sense, the method described in Sec. 3.4.1 was 
too conservative. By limiting our labeling of foregrounds to 
modes that are orthogonal to the signal, we arrived at an es- 
timator with no formal signal loss, but one in which substan- 
tial foreground residuals remained in the final answer. Here 
we consider an estimator that more aggressively projects out 
foregrounds. The result will be a lossy treatment, but we will 
also show how this can be rectified. 

Instead of isolating the foregrounds orthogonal to the sig- 
nal, consider the component of the signal that is orthogonal to 
the foregrounds. The signal is then partitioned as 

x = x\\ F + x± F (27) 

For this decomposition to be meaningful, we must once again 
decide on a definition for our foreground modes. We let F be 
the largest foreground eigenvectors of S (without the orthog- 
onality restrictions of the previous section). In the calculation 
for S, the mean y contains the cosmological signal and is 
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Figure 5. Modes discovered from the (i/, v') covariance of simple simulations with three synchrotron spectral indices (Fig. 4). These include only the foregrounds 
and simple signal model of Eq. (3) with A z = 0.5 and z r = 10.6. Top left: the largest three eigenvectors of the covariance 5] that fully describe the foregrounds. 
Top right: The filtered data (1 — FF T )y. The input signal in red is largely recovered after cleaning the three foreground eigenmodes in the top left. In this 
toy model, the foregrounds are removed entirely, and only the signal (minus components parallel to the subtracted foregrounds) is left. Bottom left: The largest 
three eigenvectors of the restricted covariance X|aj = [1 — x(x T x)~ 1 x T ]F t . These are all orthogonal to the cosmological signal. Bottom right: The data after 
subtracting the restricted eigenvectors [or (1 — F ±F’±)y.\ are exactly the signal template shape, by the construction of the mode functions. The amplitude, 
however, is clearly dominated by residual foregrounds. These are the foreground components parallel to the cosmological signal. While this procedure is immune 
from loss of cosmological signal, it is insufficiently aggressive when dealing with foregrounds, and the final result is strongly influenced by the foreground 
covariance. We note that the apparent recovery of the reionization step’s location in the bottom right subplot is spurious, for it is almost entirely driven by our 
choice of which theoretical template we use. The location of the “step” in the top right subplot, on the other hand, is truly constrained by the data, although signal 
loss means that it no longer appeals as a clean step. 


subtracted out. For this reason, the cosmological signal does 
not perturb the (i/, i/) covariance or its eigenmodes. This is 
in contrast to inhomogeneous 21 cm mapping, where the cos- 
mological signal is stochastic, cannot be subtracted from the 
covariance, and does perturb the eigenvectors (Switzer et al. 
2013; Masui et al. 2013). (The foreground modes can op- 
tionally be isolated from thermal noise by forming the (z/, z/) 
cross- variance of maps acquired at two different times, as was 
done in Switzer et al. 2013.) 

To project the identified foreground modes out of the esti- 
mator, we apply 1 — 11 = 1 — FF . This operation splits the 
spectrum into a foreground contaminated and a (theoretically) 
foreground-clean subspace as 

( i n n ) y = ( 1 + ( n o s ) + ninst ’ (28) 


The overall error is now 


Var(d) 



N ig \ 
JVe-l-JVfJ 


[x T (l- FF 1 )x 


(30) 


To get some rough intuition, assume that the foreground 
modes all have about the same overlap with the cosmological 
signal. Then a; T (l — FF T )x oc N u — Nf g . As Nf g ap- 
proaches N v , the foreground modes do a progressively better 
job of spanning the N v spectral space. In the limit that they 
fully span the space (there is no signal distinguishable from 
foregrounds), the signal is also nulled and the error diverges. 
Alternately, as Nf g approaches Ng, the foreground discovery 
process uses up all the spatial degrees of freedom and the error 
diverges. In this formulation, then, we are self-consistently 
including the possibility of signal -loss in our error analysis. 


Unlike the previous estimator Eq. (23), however, the fore- 
ground modes are no longer constructed to be orthogonal 
to the signal. By projecting out the foregrounds, then, it is 
likely that some cosmological signal will be lost. Mathemat- 
ically, (1 — FF t )x equals x L f , not x, and the estimator 
a — (x T x)~ 1 x T (1 — II)y is an incorrectly normalized esti- 
mator of the signal amplitude (i.e., it suffers from a multiplica- 
tive bias). Correcting this, we propose the revised estimator 


x T ( 1 - FF T )y 
x T {\ - FF t )x' 


(29) 


In the error calculation from Eq. (26), the key term (1 — 
n)S now reduces to lof nst , under the assumption that all 
foregrounds are removed to a good approximation, and only 
Gaussian thermal noise remains. While the foregrounds may 
be strongly non-Gaussian, we will assume that the residuals 
after the modes F have been subtracted are Gaussian and due 
to the thermal noise floor. This would need to be verified in 
analysis of real data. 


4. CONSTRAINTS ON PASSBAND CALIBRATION 

The abstract design guidance suggested by this method is 
that 1) the experiment should not increase the rank of the 
foreground covariance (keep Nf e small) 2) if there are fore- 
grounds with iVf g spectral degrees of freedom, at least Ng = 
iVfg samples of its variations need to be observed if those fore- 
grounds are to be subtracted. 

A time-varying passband calibration is the worst instrumen- 
tal systematic in this sense. Here each line of sight sees a 
slightly different foreground spectrum and so requires a dif- 
ferent spectral function for cleaning. In the limit that each 
line of sight can have a different foreground, Nf s — > Ng and 
the errors diverge. In contrast, a constant passband error only 
modulates the foreground covariance spectral functions, but it 
does not increase the rank of its covariance. Its impact is more 
aesthetic, can be corrected by re-calibrating off the brightest 
mode, and it does not fundamentally limit the investigation. 
(This is proven in Sec. 6 using specific notation developed 
there.) 
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Fig. 6 shows the 10 simulated lines of sight with a constant 
10% calibration error. Here we multiply the spectrum along 
each line of sight by a constant 1 + 6, where S is /V^-long and 
normally-distributed. Fitting a smooth power law in this case 
would leave unacceptable residuals, but the modes are able to 
discover the constant calibration error and subtract the fore- 
ground. In contrast, Fig. 7 shows that even a 0.1% variable 
passband calibration error spreads foreground variance into 
many modes that are all significantly larger than the signal 
(10 -5 there). To implement a variable passband, we multiply 
each line of sight by a different 1 + 5,;, representing a change 
in the instrument response with time. 

The rule of thumb here is that if foregrounds are R times 
larger than the signal, then the per-pointing calibration needs 
to be stable to ~ 1/R. Averaging over sightlines tends to 
relax this constraint. In simulations shown in Sec. 7.1, we 
find that the stability must be better than ~ 10 -4 to have no 
effect. The constraint degrades rapidly above that. In the other 
limit of extremely high stability, we find that once calibration 
stability is suppressed below the thermal noise, there is no 
further gain in foreground discrimination. 

An important point is that the method developed here can 
discover any response to foregrounds that varies across the 
sky (non-monopole). This does not necessarily need to orig- 
inate from passband calibration or differential polarization 
response — these are just plausible sources of error from in- 
strumental response. 

5. INCORPORATING SPATIAL INFORMATION 

The previous discussion assumed that the foregrounds are 
stationary and independent between different lines of sight. It 
is for this reason that the first step in the prescription outlined 
above was to form y, the unweighted average of all lines of 
sight — with stationary statistics, there is no reason to prefer 
one direction of the sky over another, and an unweighted av- 
erage provides the best signal-to-noise. This is the implicit 
assumption that is being made by experiments that average 
over large regions of sky at once (such as single-dipole exper- 
iments) and produce a single spectrum as their measurement. 
The results in preceding sections therefore also apply to ex- 
periments with no angular sensitivity, except without angular 
information, error bars and foreground cleaning methods can- 
not be informed by the data itself and must be derived a priori. 

In reality, foregrounds are neither stationary nor indepen- 
dent. The synchrotron brightness varies across the sky (vio- 
lating stationarity), and in addition the foregrounds are known 
to be spatially correlated (violating independence). 

Angular information can be leveraged in several ways. 
First, non-stationarity of the foregrounds allows the estimator 
to down-weight parts of the sky that are known to be partic- 
ularly noisy, such as the galactic center. Second, correlation 
information allows foreground properties in one part of the 
sky to be inferred from observations of another part of the 
sky. For example, in the unrealistically extreme limit of per- 
fectly correlated foregrounds, a measurement of foregrounds 
in any part of the sky automatically allows a perfect prediction 
of foreground brightness in any other part of the sky. 

To take advantage of angular information, LI 3 considered 
an optimal estimator given full-sky maps at all frequencies of 
interest, as well as a known N u N p - lx x N v N p \ x covariance 
N between all pixels and all frequencies. Let d be a vector 
of length N v Ng containing the measured sky maps. Here we 
use iVpi x to denote the number of pixel indices in a full-sky 
healpix map (Gorski et al. 2005). For reasons we will see in 


Sec. 7.1, this also prevents confusion between the Ng inde- 
pendent sightlines in the previous section and the present dis- 
cussion, which is made more complex by beam convolution. 

The observations d are related to the global 21 cm spectrum 
s via the measurement equation 

d = As + n, (31) 

where n = rif g + rij nst is the generalized noise, containing 
nfg and ni nst as the foreground and instrumental noise con- 
tribution to d, respectively. The N p - IX N V x N v matrix A is 
given by 1 <gt eo, where 0 is the Kronecker product, 1 is an 
N v x N v identity matrix, and eo is the spatial monopole, i.e., 
an iYpi x -long vector of Is. Its function is to copy, at every 
frequency, the value of the global spectrum s to all the spa- 
tial pixels comprising the sky map. The maximum-likelihood 
estimate Sml for the full global spectrum is given by 

s ml = (A T N~ 1 A)- 1 A T N~ 1 d 1 (32) 

where N = ( nn T ) — ( n)(n) T is the covariance matrix of the 
generalized noise, which is assumed to be known. To facili- 
tate comparisons with previous work, we will begin by con- 
sidering the estimator s of the full global spectrum. Sec. 6 
returns to the template amplitude constraint. 

5.1. Spatial weighting to recover the spectrum 

L13 used a series of analytic manipulations to show that 
incorporating angular information via Eq. (32) could in prin- 
ciple lead to large reductions in foreground contamination. In 
practice, however, this prescription is difficult to implement 
for a number of reasons. First, one may not possess suffi- 
ciently accurate models of the instrument and the foregrounds 
to write down the matrix N , placing a priori expectations on 
the instrument response. This matrix cannot be derived from 
the data (unlike S from previous sections), since it has di- 
mensions N v Ng x N v Ng and therefore contains many more 
degrees of freedom than the number of measurements N v Ng. 
Moreover, even if the matrix is somehow known, its large size 
makes its inversion in Eq. (32) computationally challenging. 

To deal with these challenges, consider a modified recipe 
where the sky maps are dealt with frequency-by-frequency 
[unlike in Eq. (32) where all frequencies and all lines of sight 
are mixed together] and an estimator for the global signal at 
each frequency is formed by spatially averaging with non- 
trivial weights [unlike in Eq. (6), where different lines-of- 
sight were equally weighted]. Let dp be a vector of length 
iVpix that represents the map within d corresponding to the 
/3th frequency channel. The global signal at this frequency 
channel can then be estimated by computing the weighted av- 
erage 

sp = w'pdp, (33) 

where wp is a vector of length A ? p ; x with weight appropri- 
ate for the /3th frequency channel. Throughout this paper, we 
adopt a convention where summations are written explicitly. 
Repeated indices therefore do not imply summations. The 
variance of this estimator — which we will seek to minimize — 
is 

Var(s 9 ) = {s 2 p) - {sp) 2 = wj&pwp, (34) 

where Qp is the N p i x x -/V p i x spatial covariance matrix of the 
map at the /3th frequency. For our final estimator to be cor- 
rectly normalized, we require the weights at each frequency 
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Figure 6. A toy model showing that an absolute passband calibration error that does not vary with time does not impact primordial signal recovery. Top left: 
Observations of the spectrum across 10 lines of sight with a stable 10% passband calibration error between bins (here neglecting thermal noise). The input 
synchrotron is taken from a mixture of three power laws, so is described by three spectral modes in the data. Bottom left: Foreground modes discovered in the 
data. Because of passband calibration error, these modes are not smooth. A constant calibration error does not increase the rank of the three input power laws. 
Right: Recovered cosmological signal compared to the input signal, and the cosmological signal with the three modes on the left removed. The fixed calibration 
error is cosmetic. 



frequency using the inverse spatial covariance before sum- 
ming together different lines-of-sight and normalizing. Since 
the whitening takes the form of a non-diagonal matrix multi- 
plication, this operation not only down-weights brighter parts 
of the sky, but also makes use of angular correlation infor- 
mation to better estimate the monopole signal. Tegmark et al. 
(2003) tackled the transpose of this problem for CMB fore- 
ground removal. Note that in modeling <h j, it is essential to 
capture the non-stationary nature of our galaxy’s foreground 
emission. It is insufficient, for example, to describe the galaxy 
using an angular power spectrum alone, which assumes sta- 
tistical isotropy. This would not only be physically unrealis- 
tic, but would also make our weights constant across the sky 
(a fact that can be derived by applying Parseval’s theorem to 
our expression for wp), defeating the purpose of using spatial 
weights in the first place. 


Figure 7. Residual foregrounds as a function of modes removed in a toy 
model with three foreground degrees of freedom. Amplitudes are normal- 
ized to 1 for zero modes removed, and in rms rather than variance. In the 
case with no variable calibration error, the variance in the spectrum quickly 
reaches thermal noise after three modes are removed. If the passband calibra- 
tion varies between lines of sight, the residuals become significantly worse. 
Generally, for foregrounds 10 5 times larger than the signal, the passband 
should be stable to ~ 10 — 5 . Passband calibration instability spreads the 
variance of the three intrinsic foreground modes across many more degrees 
of freedom. 


to sum to unity, i.e. wj 5 e o = 1. To obtain the minimum- 
variance solution for our weights, we use a Lagrange multi- 
plier /i to impose our normalization constraint, and minimize 
the quantity 

L = wT&pwp — (iw'pe 0 . (35) 


Doing so gives 


and therefore 


wp = 


$/e 0 

T 1 t — 1 5 

' 0$/3 e 0 


b 


e o$p ld P 
e o$p le o' 


(36) 


(37) 


In words, these weights tell us to whiten the sky maps at each 


5.2. Spatial weights within a separable covariance model 

The spatial weight recipe that we have just specified can 
also be considered a special limit of the maximum-likelihood 
estimator, albeit with one small modification. Suppose that 
the full N v Ng x N v Ng covariance N is separable, so that 
we can write it as £ (§5 <E>, where £ is the N v x N v spectral 
covariance from previous sections, $ is an /V pix x N p - IX spatial 
covariance, and (g> is the Kronecker product. 

Using the identities ( G ® JT) _1 = G 1 ® H -1 and (G <g) 
H) ( J <g> K) = GJ <g) HK , we have 

A r N~ 1 = ST 1 ® e^" 1 (38) 

and 

(A t N~ 1 A)~ 1 = £ <8) (e^ $ -1 e 0 ) -1 . (39) 

Inserting these into Eq. (32), the final estimator in this sepa- 
rable approximation is then 


b 


sep 


eZ^dp 

e^-'eo ’ 


(40) 


which is almost identical to Eq. (37). If we slightly relax the 
assumption of perfect separability by allowing $ to acquire 
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a frequency dependence (so that $ — > <& ;), the correspon- 
dence becomes exact. Including this frequency-dependent 
non-separability also allows us to incorporate the fact that the 
instrumental beam will generally broaden toward lower fre- 
quencies, in addition to intrinsic non-separability of the fore- 
grounds. Sec. 6.1 discusses non-separability caused by the 
differences between foregrounds and thermal noise. 

Interestingly — but unsurprisingly — the spectral covariance 
£ drops out of the optimal estimator for the global spectrum 
once separability is invoked. As discussed in LI 3, this is due 
to the fact that once we spatially average over the maps, we 
are left with a spectrum consisting of N v numbers. If our 
final goal is to measure a cosmological spectrum, that is also 
of length N v , the constraint that our estimator be correctly 
normalized and unbiased means that there is nothing left to 
do. Mathematically, this manifested itself in our derivation 
when the factors of £ in Eqs. (38) and (39) canceled each 
other out when forming (A T N^ 1 A)~ 1 A T N^ 1 . 

If we return to our method in previous sections, however, 
and attempt to constrain the amplitude a of a theoretical 
template, the spectral covariance £ re-enters the discussion. 
When constraining a, one is seeking to measure a single num- 
ber from N v measurements, which in general ought to be 
combined non-uniformly if these measurements have differ- 
ent error properties (as captured by £). 

6. ANALYZING GLOBAL 21 CM DATA CUBES 

In this section we combine the finite-rank, in situ estimate 
of spectral foregrounds with a model of angular correlations. 
The end product will be an estimator for the amplitude of a 
cosmological signal template that can be applied to experi- 
mental global 21 cm data cubes. The angular part of the esti- 
mator uses assumed spatial distributions to improve the ampli- 
tude estimate. In contrast, the central aspect of this estimator 
is that it does not assume a set of spectral modes, or spectral 
covariance of the data. 

It is notationally convenient to write the yV p j x Avlong data 
matrix d by reshaping it into an N v x iV pix matrix Y. This 
“data cube” is a stack of maps at all the observed frequencies. 
The original data vector d can be recovered through the seri- 
alizing “vec()” operation vec(W) = d. This data cube form 
naturally accommodates weighing operations by the separa- 
ble covariance. A left-multiplication acts on the spectral di- 
rection, and a right-multiplication acts on the spatial direction. 

The essential weighting operation on the full dataset is 
7V _1 d, which under the separable covariance assumption be- 
comes 


now assuming a form for a; as a theoretical template). Here, 
the maximum-likelihood estimate for the amplitude of some 
spectral template x is 

a = (x T '£- 1 x) T x T '£- 1 Y<f>~ 1 e 0 (e£<f>- 1 e 0 )- 1 . (43) 

This applies the familiar maximum likelihood estimators for 
the spatial monopole (with respect to covariance <1>) and spec- 
tral template (with respect to covariance £) to the right and 
left side of our data cube Y. 

Following Sec. 3.2, we can seek to replace £ with a (zz, i/) 
covariance measured in the data, £. Kollo & von Rosen 
(2005) show that, again, this choice of weight yields the max- 
imum likelihood a. Again note that the errors on a will be in- 
flated (and non-Gaussian) by the fact that the estimation uses 
additional degrees of freedom in the data. As before, we want 
to find the sample variance with the monopole mean spec- 
trum subtracted so that the cosmological signal does not enter 
the covariance. Additionally, now that the angular covariance 
$ 1 is known, it can be applied in the covariance estimation 
as 

£ = Y& 1 [1 - e 0 w T ]Y T (44) 

w = $^ 1 e 0 (eg $ _1 e 0 ) _1 

where w is an A p j x -long weight map. 

The estimator for the monopole spectrum template ampli- 
tude is then 


a = {x t £- 1 x)- 1 x t £- 1 Yw (45) 

The inverse T* 1 of the A pix x N p [ x spatial covariance ap- 
pears explicitly only in the construction of the sample vari- 
ance. Elsewhere, it is collapsed onto the single simple weight 
map w. One approximation that can be made in practice is to 
replace <1> 1 with its diagonal, which simply weighs angles 
differently in the (zz, zz') variance estimation. 

Similar to the discussion in Sec. 3.3, we can replace the 
£ 1 operation by a projection that removes a finite number 
of contaminated spectral modes. This is equivalent to 1) iden- 
tifying a data subspace parallel to foregrounds that is contam- 
inated and fully removed, 2) keeping a data subspace orthog- 
onal to foregrounds. 

Let F contain the largest eigenvectors of the sample covari- 
ance £. Then 


x T ( 1 - FF t )x 


(46) 


7V _1 d = (£ ® €*)“ 1 y = £~ 1 Y$~ 1 (41) 

using identities of the Kronecker and vec operations. 

A model for the noise on Y is to draw some matrix E 
with shape N v x N pi X of normal deviates from N(p = 
0,cr 2 = 1), and then correlate the spectral and spatial parts 
as £ 1 ^' 2 E4> 1 / 2 to give a noise realization drawn from the co- 
variance £ ® <&. 

The data model in this presentation is 

Y = axe p + £ 1/2 .E$ i/ 2 . (42) 

This is a specialized case of the “growth curve” model 
(Kollo & von Rosen 2005), reviewed in Appendix B. We can 
transfer the separable method developed in the Sec. 5.2 to the 
present case of constraining the template amplitude a by let- 
ting s — y a and A = 1 ® eg —> tc®eo (since we are 


The denominator x T (l — F F r )x is a scalar normalization. 

This is analogous to Eq. (29), except that instead of a simple 
y, each spatial slice of the map Y is weighted to find the mean 
as Yw. 

Using this notation, we can easily prove that constant pass- 
band calibration errors do not increase the rank of fore- 
ground spectral modes. Let 1 + Sc be an A r „-long vec- 
tor that represents constant mis-calibration of the passband. 
If a constant calibration error multiplies all lines of sight, 
C = diag(l + (5c) modifies the data as CY . So long as 
the response is not zeroed out at some frequencies, C is in- 
vertible, and rank(C£C T ) = rank(£) for invertible C. In- 
tuitively, a constant passband error can be thought of as just a 
per-frequency rescaling of units. Since such a rescaling does 
not affect the rank of the foregrounds, there is no increase in 
the number of foreground modes that need to be constrained. 
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Our self-discovery scheme for spectral modes will therefore 
be just as effective when constant passband mis -calibrations 
are present. 

6.1. Deriving the final estimator: a two-component 
covariance 

In deriving Eq. (46), the key assumption that we made was 
that the total covariance is separable into spatial and spectral 
components. This confuses foregrounds and thermal noise, 
which each have very different angular distributions. For ex- 
ample, we know that a synchrotron spectral mode is associ- 
ated with broad spatial distributions of galactic emission. In 
contrast, thermal noise is always uncorrelated across angular 
pixels, but may change in variance across the sky. In the limit 
that spectral foreground subtraction is perfect, only Gaussian 
thermal noise will remain. In contrast, the separable assump- 
tion would dictate that it has approximately the same angular 
correlations as the galaxy. This issue ultimately translates into 
the angular weight w that should be applied. 

The solution to this problem is to let the total covariance 
be non-separable, but comprised of independently separable 
foreground and noise components. Here, TV = TV f + N g 
where Nf = ® and N g = Y g Cg) <J> g . For con- 

creteness, imagine that the “f ’ component consists of mod- 
eled foreground modes in F, while the “g” component con- 
sists of any residuals after the projection operation. This is 
exactly analogous to the F/G split in Rao (1967) described 
in Sec. 3.3. If our discovered foreground modes happen to 
perfectly capture the true foregrounds, then the “g” compo- 
nent would consist only of Gaussian thermal noise. In gen- 
eral, however, we need not make this assumption. 

The central problem with the model TV = X (§) $ is that 
(1 — FF t ) only acts on the £ part, but not <f*. In contrast, 
in the two-component model, when (1 — FF T ) acts on the 
data, the remaining covariance is only Y g ® <t> 9 , e.g., it cor- 
rectly ascribes s spatial correlations to the modes that F 
removes. 

We can now reassess the w in Eq. (46) which yields the 
minimum-variance estimate of a. To clarify the discussion, 
letq= [x T (l — FF t )x]~ 1 (1 — FF t )x. The first step of Eq. 
(46) in this notation is to form q T Y, which can be interpreted 
as a series of estimators for a, one for each line-of-sight. 

With our two-component covariance, a little algebra reveals 
that the variance to minimize is given by 


covariance model. The new weights w reflect the spatial co- 
variance of residuals after the frequency mode subtraction 
rather than the spatial covariance of the sky itself. In spirit, 
this is reminiscent of the approach in LI 3, where a best-guess 
model of the foregrounds was first subtracted from the sky 
maps before the variance of the residuals was minimized. The 
difference here is that our best-guess cleaning is based on the 
data rather than a model. In addition, the estimator operates 
with separable spectral and spatial steps that are computation- 
ally trivial (a feature that we have managed to retain even with 
the non-separable two-component covariance of this section). 

6.2. Angular weighting 

If the F modes cleanly separate the finite rank foregrounds, 
the spatial covariance A> g that determines the angular weight 
represents Gaussian thermal noise. In this case $ g diagonal, 
and proportional to 7’ h 2 ys based on the radiometer equation. 
Experimentally, synchrotron emission on the sky dominates 
the receiver noise temperature, so the angular weight w is 
simply to divide by the synchrotron template-squared. 

In reality, there will not be a clean separation of foregrounds 
and thermal noise. This is related to the fact that there is 
not a rigorous way to determine the number of modes to re- 
move (Sec. 7.4). If the eigenvalue spectrum has a long tail, 
we can remove the obvious, high signal-to-noise modes, but 
some residual will remain. This means that the w needs to 
represent the angular covariance of the residual foregrounds. 
Following LI 3, we split the problem of non-stationarity and 
angular correlation of the angular covariance by writing 

= diag(m)Qdiag(m) T (50) 

where m is the foreground mean map. This is equivalent to 
dividing the foreground by the expected mean, and then en- 
coding the angular correlations of that normalized map in the 
matrix Q. Let these correlations be full-sky, isotropic and 
diagonal. If T is a matrix that converts from real space to 
spherical harmonics, then the correlations will be diagonal 
A = TQT t , where A is just some function of L The op- 
eration Tv 1 can be interpreted algorithmically as 

• Divide each spatial slice of the data by a foreground 
model map (a synchrotron template) 

• Transform to spherical harmonic space and de-weight 
£-by-f by we (defined below) 


Var(d) = ( q 1 Y, f q)(w T f-w) + ( q T Y g q)(w T $ g w ). 

(47) 

The first term is zero by construction, since q T Yf oc (1 — 
FF 1 )F = 0. Minimizing the remaining term subject to the 
constraint that the weights sum to unity requires minimizing 
the quantity 

L = ( q T Y g q)(w T <b g w ) - pw T e 0 , (48) 

where // is again a Lagrange multiplier. Since q 1 Y g q is just 
a constant, it can be absorbed into p by a simple rescaling. 
One then sees that L is identical to Eq. (35) except that <f> g 
(the spatial covariances of the residuals) takes the place of 
(the full spatial covariance at the /3th frequency). We may thus 
import our previous solution for the spatial weights to find 

w = le o( e o 1 e 0 ) _1 (49) 

In conclusion, we see that a simple re-interpretation of Eq. 
(46) is all that is required to accommodate a two-component 


• Transform back to map space and divide by the fore- 
ground model map again. 

Here the i weighting reflects the angular correlations of any 
residuals after the (1 — FF T ) cleaning step. This is also 
the step where galactic masks could be applied in real space 
through a choice of to - 1 that is zero in those regions. 

The choice of w is related to the optimality of the estimate 
of a. As a first pass, we can always use the Gaussian thermal 
noise assumption, which has we. = 1. An i-by-l weighting of 
the residuals then has the potential to improve the estimate of 
a. Detecting some angular power spectrum in the ( 1 — FF 1 }- 
cleaned map that is measurably different than Gaussian noise 
could indicate that a better weight we is possible. A simple, 
two-parameter model for the correlations is to let the residuals 
have some amplitude £ relative to thermal noise and correla- 
tion length a, as 


we 


£ e — a 2 ^+ 1)/2 + e e b e(e+i) 


(51) 
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Figure 8. Top: Spatial weights at 80 MHz for noise-dominated residuals 
(£ = 0.5; u = 20°). The main purpose of spatial weighting in this regime 
is to average down noise, so all the weights are positive. Bright regions 
of the sky (which cause more instrumental noise for sky-noise dominated 
instruments) are down-weighted. Bottom: Spatial weights at 80 MHz for 
foreground-dominated residuals (£ = 50; cr = 20°). In this regime, the 
spatial weights play a role in foreground subtraction, and thus the weights are 
both positive and negative, as different parts of the sky are differenced to miti- 
gate residual foregrounds. The beam size assumed in both cases is 6b = 10°. 
The weights are defined to sum to 1, but only the shape, not normalization, is 
relevant. 


Here we have assumed that the input data are deconvolved by 
the beam before applying angular weights (dividing by Be). 
The term t 0bf '^ :+ l> describes thermal noise in that case. By 
picking the a and £ to give the lowest error, we may give 
preference to parameters where the error spuriously scatters 
low. A better alternative is use the known instrumental ther- 
mal noise to inform £, and assume residuals in the data are 
correlated across the beam scale 0f, (not determining wi based 
on the error of a). Ideally, the residuals would be dominated 
by nearly Gaussian thermal noise, and we = 1 would be a 
sufficiently optimal weight. 

In Fig. 8 we show example spatial weights computed from 
Eqs. (49), (50), and (51) in two different regimes. The top 
panel shows a noise-dominated case (£ = 0.5) while the 
bottom panel shows a foreground residual-dominated case 
(£ = 50). In the noise-dominated case the spatial weights are 
designed to simply average down the noise, and are therefore 
all positive. The galactic plane and the bright point sources 
are seen to receive almost zero weight, since they contribute 
the most thermal noise in a sky-noise dominated instrument. 
In the foreground-dominated case, the spatial weights attempt 
to further suppress residual foreground contamination. There 
are thus both negative and positive weights, since different 
parts of the sky can be differenced to subtract off foregrounds. 
In both regimes, the right half of the sky is given more weight, 
since the galaxy is dimmer there (see Fig. 2). 

An important property of our spatial weighting scheme is 
that the normalization is irrelevant, and only the shape mat- 


ters. This is a particularly attractive property, since most mod- 
els of the sky at our frequencies of interest are based on in- 
terpolations and extrapolations from other frequencies. The 
amplitude may have large uncertainties, but the shape infor- 
mation is likely to be much more reliable. If one is particu- 
larly confident about the available shape information, it may 
be prudent to go one step further and to set wi — o = 0 by 
hand. Examining Eqs. (50) and (51) reveals that this projects 
out any component of the measured sky that has precisely the 
same spatial shape as our foreground model. Note that even 
though this is the £ = 0 weight, we do not destroy the global 
monopole signal that we seek to measure, since the we act in 
a pre-whitened space after dividing by m. The global signal 
therefore takes the form 1/m and is no longer just the i = 0 
mode. It is instead spread out over a wide range of l val- 
ues. Setting we — o = 0 then sacrifices the sensitivity in one of 
the modes within this range, resulting in slightly increased er- 
ror bars as one deviates slightly from the optimal prescription 
described above. However, this may be a cost that is worth 
bearing for the sake of robustness in an aggressive campaign 
against foreground systematics. 

6.3. Summary of the proposed algorithm 

The final algorithm suggested here for analyzing global 
21 cm data cubes is 

• Use a spatially-weighted average to project the 
monopole spectrum out of the map and find the (y, v') 
sample covariance. 

• Find the largest eigenvectors of the sample covariance 
and project them out of the map. 

• Combine lines of sight using a prescription for the an- 
gular correlation of the residuals (which conservatively 
is to divide by the synchrotron template-squared). 

• Dot this against the signal template to find the ampli- 
tude and perform error analysis. 

While this algorithm is intuitive, our goal here has been 
to describe some of the implicit choices: 1) discovering fre- 
quency modes within the data (and implications for errors), 2) 
choosing to remove the part of the signal parallel to the fore- 
ground modes, 3) assuming a two-component separable form 
for the thermal noise and foregrounds/respectively. The next 
section describes several considerations for using estimators 
of this type, and challenges in global 21cm signal measure- 
ment. 

7. CONSIDERATIONS FOR ANALYZING GLOBAL 21 

MAPS 

In the next few subsections, we will consider 1) the no- 
tion of resolution elements and the choice of beam size, 2) 
combined passband stability and resolution considerations, 3) 
challenges in error estimation from residual monopoles, 4) de- 
termination of the number of modes to remove, 5) extensions 
to simple amplitude constraints, 6) applications of our meth- 
ods to the pre-reionization absorption feature, and 7) com- 
plications in foreground mitigation that result from Faraday 
rotation. 

7.1. Resolution elements and foreground modes 

In Sec. 3, the data were spectra of independent lines of 
sight. There, the number of foreground modes removed (JVf g ) 
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compared to number of sightlines (Ng) entered the errors as 
~ (7Vf g — Ng ) _1 . The sense of Ng is more complex in our 
present case of an N v x iV p j x Y data cube where the spatial 
part of sky signal is convolved by an instrumental beam,. 

Intuition suggests that the number of independent sight 
lines is approximately the number of beam spots on the sky, 
but this is incorrect. If each spatial slice of a noiseless full- 
sky map is written in terms of its spherical harmonics 7) m , 
convolution by the beam is multiplication by Be. Unless the 
beam falls to zero somewhere, this operation is invertible and 
so does not modify the rank of S. The rank of Y is _ZVf g so 
long as iVpi x > Nf g and iV„ > Nf g . Hence, all the spec- 
tral modes can still be recovered from the noiseless beam- 
convolved map. 

With wider beams, higher eigenvalue spectral eigenmodes 
(typically the less smooth modes) are washed out by the beam. 
These modes do not disappear entirely, but their amplitude di- 
minishes. We can develop a rough rule for this effect by find- 
ing the eigenvalue spectrum of full-rank white noise that is 
convolved by a given beam size. A fitting form for the sup- 
pression of eigenvalues (normalized to give no suppression 

for infinitely fine beams) is b n « io _ *' 6 ' PWHM//3CI0 1 W", as a 
function of eigenvalue n and full-width at half-max (FWHM) 
of the beam. 

At first sight, it would appear that a more rapidly falling 
eigenvalue spectrum is beneficial because it means that the 
foregrounds are better described by fewer covariance modes. 
When the suppression occurs because of beam convolution, 
however, it can be problematic in the presence of instrumental 
noise. With instrumental noise, the information about a mode 
may be suppressed below the noise floor. That foreground can 
therefore no longer be cleanly detected and removed, and may 
contaminate the signal and lead to larger errors. Such detec- 
tion and removal is potentially crucial, since higher eigenval- 
ues can still greatly exceed the signal (even after beam sup- 
pression) because of the hierarchy of scales. 

We therefore come to the conclusion that even if angular 
resolution does not impose a hard limit on the number of fore- 
ground modes that may be discovered (as suggested in the 
case of independent sight lines), it does put a practical limit 
on the efficacy of the cleaning. Note that the eigenvalues of 
S = Sf g + Ejnst will be different than Ef g . Adding noise has 
the effect of mixing the high-variance foreground modes with 
the thermal noise modes, making it harder to cleanly separate 
foregrounds. 

A rule of thumb for finding the number of modes dis- 
coverable before they are confused by noise is to plot the 
eigenvalue spectrum of noiseless foregrounds and the ther- 
mal noise, then find where they intersect. Modes some mar- 
gin above this point will still be well-measured. Consider a 
normalized eigenvalue spectrum of foregrounds that follows 
A n ~ b n ■ 10 -<m (Liu & Tegmark 2012). The eigenvalues of 
thermal noise are relatively flat in n in comparison. Finding 
the intersection with the noise level, the number of foreground 
modes that can be learned in the data is 


N ig 


log(FNR) 


f #fwhm 2 ,_i 
^ 300° ) /sky + a 


(52) 


where FNR is the foreground-to-noise ratio (in map variance). 
In our simple models here, a may be a factor of a few, for only 
a few modes in total. In this idealized setting, the rather le- 
nient scale of 300° in our fitting form for b n means that an 
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Figure 9. Error of the recovered cosmological template amplitude as a func- 
tion of calibration stability and instrumental resolution, where the fiducial 
amplitude is a = 1. The thermal noise is set so that in the limit of no cal- 
ibration or beam effects, the amplitude can be constrained to 5er. Here we 
consider foregrounds with four intrinsic spectral degrees of freedom. The 
constraint degrades for FWHM above ~ 50° and 10 — 4 stability in passband 
calibration. At low resolution, the modes can not be discovered. Calibration 
stability worse than the thermal noise produces many more modes that must 
be constrained and removed. The Monte Carlo estimates in the upper right 
become very noisy and we saturate the color scale. 



instrument could cover the full sky at poor resolution and still 
discover all the foreground modes. In reality, instrumental 
effects will drive a higher number of required modes. For ex- 
ample, Switzer et al. (2013) had to remove tens of modes to 
clean foregrounds even though only approximately four in- 
trinsic synchrotron modes were expected. 

An advantageous factor here is that the sky signal is con- 
volved by the beam while instrumental thermal noise is not. 
Hence, thermal noise can be estimated within the data using 
lVpi x independent samples. In an ideal experiment, all of the 
foreground modes can be discovered and subtracted in F so 
that only thermal noise remains. Then errors could then be 
assessed from the map’s thermal noise, and signal loss could 
be corrected against the subtracted modes as x T (1 — FF T )x. 

It is generally beneficial to cover larger areas with finer res- 
olution to better constrain and discover foreground modes. 
Higher resolution also helps focus de-weighting of particu- 
lar contaminated sky regions. An important exception to this 
occurs when the number of foreground modes scales with 
the FWHM or sky fraction. Both Faraday rotation and pass- 
band instability will generally increase the number of modes 
needed for studies on larger areas with smaller beams. Each 
new beam that is observed could see a new rotation measure, 
or be observed with a different passband. 

7.2. Summarizing the effects of passband stability and 
angular resolution 

Fig. 9 summarizes two of the main points of our paper. 
Here, we use the galaxy model described in Sec. 2 to perform 
a Monte Carlo simulation for the errors on a as a function of 
calibration stability and instrument resolution. Fig. 10 shows 
the four largest modes of this extended GSM model. All fur- 
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Figure 10. Spectral eigenmodes of the foreground model described in Sec. 2. 
The second, third, and fourth modes are 3 X 10 -4 , 8 X 10” 7 and 6 X 10 -8 
down from the first mode (in variance), respectively. 


ther modes of the model itself (without noise) are negligible, 
at the level of machine precision. Note that in contrast to the 
simple pedagogical models in Sec. 3 (with three modes), the 
full data cube simulation is more representative of the syn- 
chrotron sky. 

The sense of the passband calibration stability in our Monte 
Carlo is pixel-to-pixel (e.g., not beam convolved), so is not af- 
fected by the beam convolution operation. We set the thermal 
noise level to produce a 5<r detection of a in the absence of 
beam and calibration effects. For beams that are too large, 
not all foreground modes can be recovered and subtracted. 
In the simulations here with four modes, the constraint on a 
worsens gradually as the FWHM exceeds ~ 50°. For calibra- 
tion stabilities worse than ~ 10~ 4 , each line of sight responds 
differently to bright foregrounds, so requires a new spectral 
function to remove. As soon as these passband calibration 
perturbations exceed the noise floor, they rapidly degrade the 
constraint. 

Decreasing the thermal noise through longer integration 
times loosens the beam resolution requirement (because fore- 
ground modes are better measured). It does not modify the 
passband stability requirement. Passband stability is driven 
by the separation in scales between the foregrounds and cos- 
mological signal, and the fact that each line of sight has a 
different response to the foregrounds. Even if the modes are 
well-measured relative to thermal noise, they quickly over- 
whelm the signal. 

Throughout, we have assumed that the angular resolution 
at all frequencies is constant. This is difficult to arrange ex- 
perimentally. More typically, FWHM oc A by diffraction. 
In the frequency range considered here, a diffraction-limited 
FWHM would change by a factor of 2.5. Variation of the 
beam will mix spatial structure into frequency structure as 
different beams at each frequency see different angular con- 
tamination patterns (Morales et al. 2012). Implicitly, we have 
assumed that the beam is known well enough to convolve all 
the frequency slices to a common resolution. In contrast, if no 
correction is performed, the simulations described in Sec. 2 
with four modes will proliferate into 1 1 modes. These new 
spectral modes describe the spatial structure that is mixed in 
to the spectral direction. The frequency-dependent beam must 
therefore be corrected to achieve efficient cleaning. Errors in 
the beam model executing this correction then translate into 
new spectral contamination modes. 

The tolerance of beam models is experiment-dependent and 
is beyond the scope of the present work, but must be care- 
fully considered. We emphasize that frequency-dependent 
beam effects impact any global 21cm experiment, whether 


they choose to take advantage of angular information or not. 
The new spectral mixing modes reflect a best effort of our 
algorithm to describe new spectral covariance that must be 
cleaned to reach the 21cm monopole. Careful experimental 
design and measurement of the beam will control these co- 
variance modes (which remain inaccessible in a single-beam 
measurement). 

Here we have limited the intrinsic foreground rank to four. 
In reality, it may be discovered that foregrounds on the sky 
have higher rank, pushing the resolution constraint. Also, 
we assume a full sky here — an experiment covering a smaller 
fraction of the sky would need f~^ more resolution in our 
simple approximations here. 


7.3. Challenges in error estimation 

In the absence of prior information, monopole contamina- 
tion is indistinguishable from the signal. This contamina- 
tion may arise from a synchrotron monopole, or from ad- 
ditive 7’ rx (z./) of the receiver that is temporally constant but 
has some spectral structure. The foreground modes are esti- 
mated from the (v, v') sample covariance S of the map with 
the mean spectrum removed. The modes in F represent the 
frequency components of the spatially fluctuating part of the 
foregrounds, and the operation 1 — FF 1 is uninformed by a 
monopole signal. 

The monopole foregrounds may have significant overlap 
with the spectral functions of the spatially fluctuating fore- 
grounds. In our simulations here, the foregrounds are ran- 
domly drawn from a fixed set of spectral modes. For example, 
if there is spatially-varying synchrotron with a spectral index 
of —2, that synchrotron emission will also be subtracted from 
the monopole by 1 — FF 1 . 

In reality, we cannot rule out the possibility of a foreground 
monopole that remains even after cleaning the spatially- 
varying synchrotron modes. Further, the spectral pattern 
of constant instrument T rx (i/) will never be discoverable by 
mapping across the sky, and could be confused with the global 
signal. These present serious challenges for rigorous error es- 
timation. 

Performing the angular operations on the right of Eq. (46) 
first, the noise terms of a can be isolated as 


ttr — ttjtrue H - 


x T (l - FF t ) 
x t (1-FF t )x 


(rxfg -f riinst) 


(53) 


where n f g and ni nst are the foregrounds and thermal noise of 
the monopole through Yw. We lump the constant 7’ rx (z/) in 
with rifg rather than introducing a new term. 

While the thermal noise contribution oc x T (l — FF T )rii nst 
has a well-defined distribution and can easily be included 
in the errors, x T (l — FF T )ri{ s is much more challenging. 
If x T (l — FF T )n f g were always positive, then a could 
be interpreted as an upper bound. This practice is com- 
mon in treating foregrounds to the power spectrum because 
their contribution is always positive in the quadratic quantity 
(Switzer et al. 2013; Parsons et al. 2013; Dillon et al. 2014; 
Paciga et al. 2013). Here, we have no guarantee that the resid- 
ual foreground monopole dotted into the signal is positive. 
Recall that even though synchrotron may be a strong power 
law, the residuals here will largely represent unknown instru- 
mental factors that could dot with arbitrary sign with the sig- 
nal template. It has long been known that higher-order poly- 
nomials lead to better subtractions. This suggests that resid- 
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uals will generally produce two-sided errors. Further, any 
monopole residual foregrounds are likely to be non-Gaussian, 
inheriting from non-Gaussianity of the bright foregrounds. 

These “monopole” foregrounds do not strictly need to be a 
monopole on the sky — they only have to be constant over the 
area of the survey, if that only covers part of the sky. This 
again bolsters the argument that global 21cm experiments 
should observe as much of the sky as possible with modest 
resolution to maximize their sensitivity to spatial variations 
that could lead to significantly cleaner signal recovery. 

Barring any prior information about the foregrounds or in- 
strumental response, the best ways to immunize the analysis 
against this response are 1) to document that the spectral cube 
cleaned with (1 — FF T ) is consistent with Gaussian thermal 
noise, 2) to cover a large area of the sky with modest reso- 
lution to capture additional spatial variation of foregrounds. 
Additional information about spectral smoothness and in- 
strument response can be used (Bowman & Rogers 2010; 
Voyteketal. 2014) to further clean the spectral monopole. 
Quoted errors would be based only on thermal noise, and 
boosted to account for signal lost in the cleaning process, 
Eq. (30). Another possibility would be to conduct e.g. sur- 
veys of the North and South Celestial Poles with different re- 
ceiver architectures. If these were performed independently 
and had the same result up to thermal errors, it could lend 
some confidence that the any residual monopoles in the re- 
spective surveys are small compared to noise. 
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7.4. Determining the number of modes to remove 

Fig. 1 1 shows the fractional rms of the cosmological reion- 
ization signal remaining after removing the four foreground 
modes of the extended GSM (Sec. 2), shown in Fig. 10. We 
use the operation (1 — FF T ) described previously. The fore- 
ground spectral functions overlap strongly with the signal, es- 
pecially for slower, spectrally smoother reionization histories 
(high A z in the plot). Nulling four foregrounds causes sig- 
nificant signal loss over a range of reionization models. Any 
instrumental response that increases the rank of these fore- 
grounds will result in additional signal loss from the new 
modes it introduces. 

In the case of these simulations, we know the precise num- 
ber of spectral degrees of freedom. In experimental practice, 
the number of contaminated modes can not be determined di- 
rectly. This can be made especially ambiguous by an eigen- 
value spectrum that drops slowly rather than reaching a clear 
noise floor. 

A central challenge of any global 21cm signal estimator 
is the determination of the number of modes to remove (Rao 
1967; Kollo & von Rosen 2005). In similar methods applied 
to the 21cm autopower (Switzer et al. 2013), an advantage 
is that the bias from foregrounds is purely additive. In this 
case, an experiment can report the constraint with errors as a 
function of modes removed. This will generically fall as more 
contaminated modes are removed. This decline will level out 
if the foreground have been discovered. As further modes are 
removed, the errors will increase due to cosmological signal 
loss, which is self-consistently included in our formalism. A 
reasonable prescription for a bound is to find the number of 
modes which gives the most stringent upper bound. 

In the global spectrum case, foreground residuals after N 
spectral modes are removed can have either sign in the dot 
product with the signal template. This two-sided bias should 
stabilize as most of the foreground modes are discovered. For 


Figure 11. Fraction of the root-mean-squared of the reionization signal re- 
maining after the first four foreground modes are removed. The signal is 
given by Eq. (3) for a simple reionization scenario with central redshift z r 
and width Az. The first mode is a power law (see Fig. 10) and most of the 
signal remains for these reionization scenarios. The other three modes carry 
more redshift information, and tend to express signal variations at high red- 
shift better. In all cases, as the signal becomes smoother (A z increasing) it 
becomes harder to differentiate from the foreground modes. 

more modes removed, the errors will grow, but the central 
value should remain stable if most of the foreground covari- 
ance is described and removed. The summary of a global 
21cm experiment results should include plots of the spec- 
tral modes removed and characterization of the constraint as 
a function of modes removed. 

Both this issue and the residual monopole in Sec. 7.3 cannot 
be addressed rigorously. An irreducible challenge of global 
21 cm measurement is determining whether any foreground 
bias remains. The methods here provide excellent prospects 
for cleaning the data and guiding an experiment, but do not 
solve these central issues. 

7.5. Extending the spectral template 

We have discussed constraints on the amplitude of a cosmo- 
logical 21 cm signal template that is known in advance. The 
primary goal of the current generation of instrument is dis- 
covery of this amplitude. Subsequent experiments will then 
constrain the spectrum itself. The methods developed here 
can also be applied to that case. 

There is a continuum of models between the template esti- 
mate here and a full spectrum. One possibility would be to use 
prior information about the redshift of reionization and esti- 
mate two amplitudes, a(z < z r ) and a(z > z r ) then examine 
the significance of a step feature. Another option would be 
to constrain a fixed number of spectral spline amplitudes. In 
the limit that each frequency bin has a different amplitude, the 
methods developed here can find the component of that signal 
orthogonal to the identified foreground modes. 
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Figure 12. The same as Fig. 12, except that we now consider a pre- 
reionization absorption feature parameterized by a Gaussian. The duration 
is given as the cr-width of the Gaussian. Note that significantly more signal 
remains. 


The decision between models must juggle 1) use of prior 
information, 2) discriminatory power, 3) insight about the tra- 
jectory of the global 21 cm signal. This model selection prob- 
lem is deferred to future work. 

7.6. Application to pre-reionization absorption 

Throughout the paper, we have considered the global 21 cm 
signal from reionization. This choice was primarily pedagog- 
ical, to have a simple step around which the foregrounds are 
subtracted. The maximum contrast of this feature is ~ 30 mK, 
and for even modest A z, it becomes smooth and more easily 
subtracted by the foreground modes. In contrast, in the period 
prior to reionization, the 21 cm is thought to go into absorp- 
tion at ~ 100 mK (Pritchard & Loeb 2010). Here, the spin 
temperature is strongly radiatively coupled to the cold gas. 
However, while synchrotron radiation contaminating z ~ 10 
21 cm radiation around the time of reionization is ~ 500 K 
(averaged across the sky), the contamination of an absorption 
signature at z ~ 20 is ~ 3000 K. 

While the synchrotron emission from higher redshift 21 cm 
emission may be greater, the absorption signature has the ad- 
ditional feature that it is two-sided, both falling and rising. 
This generally makes foreground modes less parallel to the 


signal and leads to less signal loss. For our purposes here, 
we can again develop a simple model where r I\ is a Gaus- 
sian function parameterized by a central redshift and width 
a z . Fig. 12 is analogous to Fig. 11 except that it considers 
this era. Generally much more cosmological signal remains 
orthogonal to the foreground modes. 


7.7. Susceptibility to Faraday rotation 

In addition to the Stokes-I component of the foregrounds, 
synchrotron emission is also polarized. The polarized emis- 
sion is subject to Faraday rotation as it traverses various col- 
umn densities of free electrons in magnetic fields. If a re- 
ceiver were designed to be purely sensitive to Stokes-I, then 
this spectral fluctuation could be ignored. Generally though, 
a receiver will have some level of response to polarized sig- 
nals, and Faraday rotation produces a signal which oscillates 
in frequency. The rotation angle is (RM) • A 2 , where RM is the 
rotation measure. Even for modest RM, the rotation angle can 
vary rapidly over our band. This is especially problematic for 
the global 21 cm signal because the rotation measure varies 
over the sky, increasing the number of contaminated modes in 
the data. 

Extragalactic sources are subject to Faraday rotation 
through the screen of the entire Milky Way. This has been 
measured recently by Oppermann et al. (2012, 2014), and is 
shown in Fig. 13. Each point source will generally have 
a different rotation measure and require a new degree of 
freedom to describe. Bernardi et al. (2013) has conducted 
deep, wide-field observations of polarization of galactic syn- 
chrotron emission at 189 MHz. They find only one source out 
of 70 with S > 4 Jy with polarization at the level of 1.8% 
in 2400 deg 2 , with all others falling below 2% polarization 
fraction. 

Polarization and Faraday rotation of galactic synchrotron 
emission is more complex because synchrotron emission is 
interspersed with the Faraday screen. Bernardi et al. (2013) 
find peak polarized emission at ~ 13 K and rotation measures 
mostly below 10 radm' 2 . The low rotation measures and 
polarization fraction are thought to be due to a depolarization 
horizon (Landecker et al. 2010; Uyaniker et al. 2003) within 
the galaxy, a distance beyond which most of the emission is 
depolarized along the line of sight. 

The global 21cm experiments to-date have been single- 
element, making it difficult to make a precise comparison 
with interferometric galaxy polarization surveys conducted 
at similar frequencies (Pen et al. 2009; Bernardi et al. 2013). 
Vinyaikin et al. (1996) measured 3.5 ± 1.0 K at 88 MHz 
(24° FWHM, phased array) and 2.15 ± 0. 25 K at 20 0 MHz 
(8° FWHM, single-dish) for the value of \J Q 2 + U 2 toward 
the North Celestial Pole. 

To assess the galactic polarization field, we use the model of 
Waelkens et al. (2009), which self-consistently simulates the 
polarized emission and rotation measure. Fig. 13 shows the 
eigenvalue spectrum of the Stokes-Q spectral modes. Unlike 
Stokes-I, polarization is spectrally much more complex, and 
its covariance is spread among many modes. Ideally, all of 
these modes would be suppressed well below the noise level 
of the experiment so they do not need to be estimated and sub- 
tracted. Based on the measured amplitudes of the polarized 
signals above, suppose that polarization is 1 0 ;i of intensity. 
To achieve a 10 5 suppression, the polarization leakage must 
be kept below 1%. 







18 


Switzer & Liu 


Rotation Measure 



-100 100 



Figure 13. Top: Galactic rotation measure map of Oppermann et al. (2014), 
in units of rad m -2 , with color range truncated at ±100 rad m~ 2 to show 
more structure out of the plane. Bottom: The eigenvalue spectrum of Stokes 
Q emission in the galactic model of Waelkens et al. (2009), normalized to 1 . 
This component falls off very slowly because the rotation measure introduces 
many degrees of freedom that vary across the sky. 


8. CONCLUSION 

Measurements of the global 21cm signal are very chal- 
lenging due to 1) astrophysical foregrounds and their interac- 
tion with instrumental systematics, 2) terrestrial radio interfer- 
ence, and 3) the ionosphere. Here we have examined the first 
issue, especially in regard to instrumental response. Follow- 
ing z ~ 1 21cm literature (Switzer et al. 2013; Masui et al. 
2013; Chang et al. 2010), we develop a new method where 
foregrounds can be jointly estimated with the monopole spec- 
trum. This relies on the fact that foregrounds vary across the 
sky, while the cosmological signal is constant. This idea also 
extends LI 3, who argue that surveys with moderate angular 


resolution covering much of the sky are able to better discrim- 
inate between the monopole 21 cm signal and foregrounds. 

The key observation arising from our cleaning method is 
that the instrument should be designed to minimize the gen- 
eration of new spectral degrees of freedom from the fore- 
grounds. For example, if each line of sight has a slightly 
different passband calibration, it also requires a new spectral 
degree of freedom to describe the foregrounds there. In this 
sense, the instrument “spreads out” the variance over modes 
that ultimately require more aggressive cleaning and signal 
loss. In contrast, a constant passband calibration error does 
not increase the rank of the foreground spectral covariance, 
and its effect is primarily aesthetic. This is fortuitous because 
obtaining a smooth spectral calibration of an instrument at 
these frequencies would require very large, expensive struc- 
tures that are black in radio wavelengths. Simply requiring 
that galactic foregrounds be smooth to a few percent provides 
a sufficient passband calibration for signal recovery. In con- 
trast, significant effort must be put into maintaining passband 
stability to at least ~ 10 -4 . This may require using much of 
the instrument’s sensitivity to integrate against a stable refer- 
ence. 

Polarization to intensity leakage is another example of an 
instrumental systematic that increases the rank of the fore- 
ground covariance. It allows spectral oscillations in the 
Faraday-rotating polarization signal to contaminate the spec- 
tral intensity measurement. Because of depolarization, sky 
polarization is only a fraction of the intensity, and our esti- 
mates of the instrumental constraints are more lax, < 10 -2 . 

In attempting to measure the global 21 cm signal, one is 
faced with bright contaminating foregrounds that can inter- 
act with instrumental systematics in non-trivial ways, over- 
whelming the cosmological signal that one seeks to measure. 
In this paper, we have developed methods that bear simi- 
larities to various previously-proposed intuitive data analysis 
techniques. However, our methods arise from a rigorous, self- 
consistent framework that allows unknown and unanticipated 
foreground properties to be derived from real data. Such a 
framework also provides guidance for instrument design. If 
design requirements can be adequately met, high-significance 
measurements of the global 21 cm signal will be possible, pro- 
viding direct access to the rich and complex astrophysics of 
the first luminous objects and reionization. 
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APPENDIX 


A. RELATION BETWEEN COVARIANCE ADIUSTMENT AND THE MAXIMUM-LIKELIHOOD LINEAR 

ESTIMATOR 

In Sec. 3.2, we claim that the estimator m = (X t T,~ 1 X)~ 1 X t 'E~ 1 x is equivalent to the projection operation, m = 
(X T X)- 1 X T (1 — II )y where II = EZ(Z T SZ) _ 1 Z r . The two essential conditions here are that X T Z = 0 (that the signal 
and non-signal vectors are orthogonal) and that the vectors in X and Z together span the space. The operations X T and Z T can 
be understood as projecting onto the signal basis and the everything-but-signal basis, respectively. This proof follows Lemma 2b 
of Rao (1967). Writing out the terms, we would like to prove 

(x T ir 1 x)- 1 x T ir 1 = ( x T x)~ 1 x T - (x T x)- 1 x T 'z,z{z T s:z)- 1 z T . 


(Al) 
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Multiplying on the right by X is trivially true because Z T X = 0 and the other terms are the identity matrix. This checks just 
the X subspace of the equality. To prove general equality we need to check the Z subspace. Multiply on the right by Z and from 


the left by X T S 1 X (which will allow simplification) 

X t YT 1 Z = -{X T 'Z~ 1 X)(X T X)- 1 X T Y,Z(Z T Y,Z)- 1 Z T Z (A2) 

X t YT x Z = -(X T S- 1 X)(X T X)- 1 X T SZ[(Z T Z)- 1 (Z T SZ )]- 1 (A3) 

A t S- 1 Z[(Z t Z)' 1 (Z t SZ)] = -(X t '£- 1 X)(X t X)~ 1 X t 'EZ (A4) 

X t Y,- 1 [Z(Z t Z)- 1 Z t + X{X t X)~ 1 X t ]T,Z = 0. (A5) 


Because X and Z span the space, Z('Z 7 Z) 1 Z T + X (X r X) 1 X r = 1, then noting that X T Z = 0 proves the identity. In 
many of the methods in the paper, the projection II is reduced to (1 — FF T ), completely projecting out a set of modes. The 
modes in F clearly cannot span the space. Further, we let the modes F overlap with the signal. These choices make the estimator 
less formally optimal, but more robust to very bright, non-Gaussian foregrounds. 

B. EXTENDING THE MONOPOLE CONSTRAINT: THE GROWTH MODEL 

The separable model for the monopole can be extended to a more general spatial-spectral template scheme. Rather than a 
simple signal model axe^ , the mean of the observed data can be described by the outer products spectral modes u, and spatial 
modes Vj (not necessarily mutually orthogonal). In terms of the N v x A ? pix data cube Y, 

Y = UXV T = J2 x ijU l vj. (Bl) 

i,j 

This extended model could represent spatial and spectral templates of galactic emission in addition to the monopole signal. The 
noise terms in the model would then describe any residuals with respect to this model, written as 

Y = UXV T + Y, 1/2 E<f> 1/2 (B2) 

This separable estimation problem is considered in detail by Kollo & von Rosen (2005). (To reach their notation, right multiply 
by d> -1 / 2 .) Form the weighted (ta i/) sample covariance with the spatial components of the signal projected out, as 

S = Y<f>~ 1 {l-VW v )Y T (B3) 

where W v = (V T ^~ 1 V)~ 1 V T ^ 1 is the standard maximum-likelihood estimator for the amplitudes of the spatial vectors in 
V. The maximum likelihood estimator for X and full-rank spectral foregrounds S is 

X = {U T S~ l U)- l U T S~ 1 YW^r. (B4) 

Kollo & von Rosen (2005) describe extensions of this more general estimation problem to finite rank S. 
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